10 research outputs found

    Ribosome signatures aid bacterial translation initiation site identification

    Get PDF
    Background: While methods for annotation of genes are increasingly reliable, the exact identification of translation initiation sites remains a challenging problem. Since the N-termini of proteins often contain regulatory and targeting information, developing a robust method for start site identification is crucial. Ribosome profiling reads show distinct patterns of read length distributions around translation initiation sites. These patterns are typically lost in standard ribosome profiling analysis pipelines, when reads from footprints are adjusted to determine the specific codon being translated. Results: Utilising these signatures in combination with nucleotide sequence information, we build a model capable of predicting translation initiation sites and demonstrate its high accuracy using N-terminal proteomics. Applying this to prokaryotic translatomes, we re-annotate translation initiation sites and provide evidence of N-terminal truncations and extensions of previously annotated coding sequences. These re-annotations are supported by the presence of structural and sequence-based features next to N-terminal peptide evidence. Finally, our model identifies 61 novel genes previously undiscovered in the Salmonella enterica genome. Conclusions: Signatures within ribosome profiling read length distributions can be used in combination with nucleotide sequence information to provide accurate genome-wide identification of translation initiation sites

    Deep conservation of ribosome stall sites across RNA processing genes

    Get PDF
    The rate of translation can vary depending on the mRNA template. During the elongation phase the ribosome can transiently pause or permanently stall. A pause can provide the nascent protein with the time to fold or be transported, while stalling can serve as quality control and trigger degradation of aberrant mRNA and peptide. Ribosome profiling has allowed for the genome-wide detection of such pauses and stalls, but due to library-specific biases, these predictions are often unreliable. Here, we take advantage of the deep conservation of protein synthesis machinery, hypothesizing that similar conservation could exist for functionally important locations of ribosome slowdown, here collectively called stall sites. We analyze multiple ribosome profiling datasets from phylogenetically diverse eukaryotes: yeast, fruit fly, zebrafish, mouse and human to identify conserved stall sites. We find thousands of stall sites across multiple species, with the enrichment of proline, glycine and negatively charged amino acids around conserved stalling. Many of the sites are found in RNA processing genes, suggesting that stalling might have a conserved role in RNA metabolism. In summary, our results provide a rich resource for the study of conserved stalling and indicate possible roles of stalling in gene regulation

    Insights into translational regulation from ribosome profiling data

    Get PDF
    Ribosomes carry out protein synthesis from mRNA templates by a highly regulated process called translation. Within the four phases of translation - initiation, elongation, termination and recycling - the focus of translation regulation studies has traditionally fallen on the initiation as the rate-limiting step in protein production. Recent evidence, however, points to the profound importance of regulatory control of elongation during development, neurologic disease, cell stress and even cancer. Ribosome profiling provides an unprecedented means of studying translational regulation on a global level. It is based on deep sequencing of ribosome-protected mRNA fragments, capturing snapshots of genome-wide translation. However, as with any new experimental technique, biases inherent in the ribosome profiling method are gradually being explored and understood, and serve to inform further refinement of the technique. In the first part of this thesis I provide a comprehensive overview of the current state of knowledge on translation and its regulation, particularly at the elongation phase. I describe the ribosome profiling technique, data processing and applications to studying translational regulation. Afterwards, I go on to present the results in the form of two scientific papers. First paper tackles the challenge of ribosome profiling data processing, setting the ground work for second paper. The second paper uses improved processing to explore ribosome stalling and its potential regulatory functions. The first paper presents Shoelaces, a tool for processing and visualization of ribosome profiling data. Here, I demonstrate how streamlining and standardizing processing steps can contribute to better quality and comparability of data for downstream analyses. At the core of this are (1) filtering genuine translating footprints from noise based on periodicity and (2) determining a specific codon being translated by the ribosome thanks to length-dependent offset calculation. Shoelaces automatically selects footprint lengths and offsets, offering a user-friendly graphical interface as well as command line interface for batch processing. By reanalyzing 79 human libraries, I show that Shoelaces retains more quality data than the original manual analyses. In the second paper, I investigate regulation of translation elongation by ribosome stalling. Utilizing the robust processing technique developed in the first paper, I apply it to process 20 ribosome datasets form yeast, fruit fly, zebrafish, mouse and human. Hypothesising that deep conservation of translation machinery would exist also for biologically significant stall sites, I detect 3293 of these conserved in at least two organisms. I find that proline and negatively charged amino acids are the main contributors to stalling. Furthermore, many of the stall sites are found in RNA processing genes, suggesting that stalling might play a conserved regulatory role in RNA metabolism. The project provides a rich resource for further in-depth studies on conserved stalling and suggests its possible roles in regulation of translation elongation. Finally, the last part of this thesis consists of conclusive remarks an critical reflection on the impact these projects brought into the field. Here, I point out possible directions for future investigations. Additionally, I include a related paper, on the use of ribosome profiling data of initiating ribosomes in re-annotation of bacterial genomes. Overall, this thesis demonstrates how mining ribosome profiling data can result in biologically meaningful discoveries pertaining to regulation of translation

    Shoelaces: an interactive tool for ribosome profiling processing and visualization

    No full text
    Abstract Background The emergence of ribosome profiling to map actively translating ribosomes has laid the foundation for a diverse range of studies on translational regulation. The data obtained with different variations of this assay is typically manually processed, which has created a need for tools that would streamline and standardize processing steps. Results We present Shoelaces, a toolkit for ribosome profiling experiments automating read selection and filtering to obtain genuine translating footprints. Based on periodicity, favoring enrichment over the coding regions, it determines the read lengths corresponding to bona fide ribosome protected fragments. The specific codon under translation (P-site) is determined by automatic offset calculations resulting in sub-codon resolution. Shoelaces provides both a user-friendly graphical interface for interactive visualisation in a genome browser-like fashion and a command line interface for integration into automated pipelines. We process 79 libraries and show that studies typically discard excessive amounts of quality data in their manual analysis pipelines. Conclusions Shoelaces streamlines ribosome profiling analysis offering automation of the processing, a range of interactive visualization features and export of the data into standard formats. Shoelaces stores all processing steps performed in an XML file that can be used by other groups to exactly reproduce the processing of a given study. We therefore anticipate that Shoelaces can aid researchers by automating what is typically performed manually and contribute to the overall reproducibility of studies. The tool is freely distributed as a Python package, with additional instructions, tutorial and demo datasets available at https://bitbucket.org/valenlab/shoelaces

    Additional file 1 of Shoelaces: an interactive tool for ribosome profiling processing and visualization

    No full text
    Analysis examples. Figures S1-S3. Three different examples of offset selection (PDF file) for human ribosome profiling datasets: SRR493747 [15], treated with harringtonine and cyclohexamide; SRR1039861 [22], treated with cyclohexamide; SRR592961 [20], no drug. Table S1: Comparison of selected footprint lengths as originally in human ribosome profiling studies and Shoelaces. Figure S4: Comparison of reads mapping to different parts of transcript as selected by Shoelaces and the original manual selection (SRR493747 [15]). (PDF 8213 kb

    ORFik: a comprehensive R toolkit for the analysis of translation

    Get PDF
    Abstract Background With the rapid growth in the use of high-throughput methods for characterizing translation and the continued expansion of multi-omics, there is a need for back-end functions and streamlined tools for processing, analyzing, and characterizing data produced by these assays. Results Here, we introduce ORFik, a user-friendly R/Bioconductor API and toolbox for studying translation and its regulation. It extends GenomicRanges from the genome to the transcriptome and implements a framework that integrates data from several sources. ORFik streamlines the steps to process, analyze, and visualize the different steps of translation with a particular focus on initiation and elongation. It accepts high-throughput sequencing data from ribosome profiling to quantify ribosome elongation or RCP-seq/TCP-seq to also quantify ribosome scanning. In addition, ORFik can use CAGE data to accurately determine 5′UTRs and RNA-seq for determining translation relative to RNA abundance. ORFik supports and calculates over 30 different translation-related features and metrics from the literature and can annotate translated regions such as proteins or upstream open reading frames (uORFs). As a use-case, we demonstrate using ORFik to rapidly annotate the dynamics of 5′ UTRs across different tissues, detect their uORFs, and characterize their scanning and translation in the downstream protein-coding regions. Conclusion In summary, ORFik introduces hundreds of tested, documented and optimized methods. ORFik is designed to be easily customizable, enabling users to create complete workflows from raw data to publication-ready figures for several types of sequencing data. Finally, by improving speed and scope of many core Bioconductor functions, ORFik offers enhancement benefiting the entire Bioconductor environment. Availability http://bioconductor.org/packages/ORFik

    Additional file 1: of Ribosome signatures aid bacterial translation initiation site identification

    No full text
    Figures S1–S9. Figure S1. Ribo-seq meta profiles at start and stop codons S. Typhimurium. Figure S2. Read length distributions at Shine–Dalgarno motifs. Figure S3: Codon-specific read length distributions. Figure S4. Additional prediction support. Figure S5. Third codon periodicity and GC content. Figure S6. Evidence for predicted novel translation initiation sites. Figure S7. Ribo-seq meta profiles at start codons for E. coli. Figure S8. Library read length distributions. Figure S9. Read length adjustments. (PDF 4700 kb

    Additional file 2: of Ribosome signatures aid bacterial translation initiation site identification

    No full text
    Tables S1–S18. Table S1. Variable importance in the S. Typhimurium monosome sample. Table S2. Variable importance in the S. Typhimurium polysome sample. Table S3. N-terminal support for S. Typhimurium predicted ORFs. Table S4. Predicted ORFs from the S. Typhimurium dataset. Table S5. Assessment of the contribution of parameter types to the predictive performance. Table S6. Support for novel predicted ORFs. Table S7. Ribo-seq sample info. Table S8. Variable importance in the E. coli TET2 sample. Table S9. Variable importance in the E. coli TET3 sample. Table S10. Variable importance in the E. coli Li1 sample. Table S11. Variable importance in the E. coli Li3 sample. Table S12: Variable importance in the E. coli Mohammad1 sample. Table S13. Variable importance in the E. coli Mohammad2 sample. Table S14. ORF predictions in the E. coli tetracycline libraries. Table S15. ORF predictions in the E. coli Li libraries. Table S16. ORF predictions in the E. coli Mohammad libraries. Table S17. Blocked N-terminal peptides. Table S18. High confidence N-terminal peptides. (XLSX 321 kb
    corecore